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Summary 


Functional  approximation  is  a  basic  tool  for  characterizing  and  analyzing  a  process  of 
interest.  Data  consisting  of  a  set  of  input-output  pairs  {xi,  yi)  e  'R?,  i  =  1, . . . ,  AT,  are 
recorded  and  used  to  build  a  model  of  the  corresponding  process  generating  function. 
Such  model  construction  is  a  common  problem  in  various  scientific  fields,  including  pat¬ 
tern  recognition,  computer  vision,  and  applied  mathematics.  The  literature  contains 
several  methods  for  constructing  such  functions  which  involve  building  models  from 
linear  combinations  of  nonlinear  functions.  Examples  of  such  methods  include  splines, 
kernel  estimates,  neural  networks,  and  radial  basis  function  networks  (RBFNs). 

Although  these  methods  are  commonly  employed,  they  do  have  several  significant 
drawbacks.  For  example,  splines  and  kernel  estimates  require  the  estimation  or  approx¬ 
imation  of  critical  parameters,  either  without  guidelines  or  at  computational  expense. 
Neural  networks  and  RBFNs  have  a  tendency  to  overfit  the  data  and  their  imple¬ 
mentation  often  requires  numerous  adjustments  supplied  by  an  experienced  user.  It 
should  also  be  noted  that  neural  networks  do  not  guarantee  convergence  to  an  optimal 
solution. 

Genetic  algorithms  (GAs),  on  the  other  hand,  have  been  proven  to  reach  an  optimal 
solution.  GAs  are  recently  developed  search  and  optimization  techniques  which  have 
been  shown  to  be  efficient,  robust,  and  provide  near  optimal  solutions.  As  such,  GAs 
may  represent  a  viable  alternative  to  the  above  methods. 

This  paper  proposes  the  use  of  GAs  and  least  squares  to  fit  piecewise  linear  functions  to 
data  sets  in  R?,  where  the  optimal  locations  of  the  knots  are  unknown.  A  GA  designed 
to  perform  such  a  task  is  described,  along  with  supporting  theory,  and  demonstrated 
on  two  datasets  -  one  is  fit  with  a  single  line  (and  the  results  compared  to  the  least 
squares  regression  line)  and  the  other  is  fit  with  a  three-piecewise  linear  function.  Our 
results  show  that,  indeed,  GAs  can  yield  near  optimal  results  at  limited  computational 
expense. 

Several  areas  are  available  for  future  research.  We  are  currently  designing  a  genetic 
algorithm  which  determines  the  optimal  number  of  lines  as  well  as  the  knot  place¬ 
ments.  A  comparison  of  the  results  of  GAs  to  those  of  related  methods,  including 
those  mentioned  above,  is  also  planned.  Finally,  we  are  exploring  the  use  of  GAs  in 
multivariate  situations,  such  as  fitting  hyperplanes  to  data,  as  an  alternative  to  MARS 
and  projection  pursuit  regression. 


Abstract 


Genetic  algorithms  are  computational  techniques  which,  given  an  optimization  prob¬ 
lem,  use  elements  of  directed  and  stochastic  search  to  find  the  “best”  solution  from  the 
space  of  potential  solutions.  We  apply  GA’s  to  the  problem  of  fitting  the  minimum 
least-squares  piecewise  linear  function  to  a  set  of  data  points  in  'R?.  We  assume  that 
the  number  of  pieces  is  known  but  the  knot  locations  are  unknown.  The  effectiveness 
of  our  algorithm  is  demonstrated  with  two  examples.  Results  are  found  to  be  quite 
promising  and  encourage  further  research. 
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1  Introduction 


Function  approximation  is  a  basic  statistical  tool  for  characterizing  and  analyzing 
some  process  of  interest.  Data  or  measurements,  often  subjected  to  random  error, 
are  recorded  and  an  approximation  of  the  process  generating  function  is  constructed 
from  this  partial  information^^).  Constructing  a  function  from  a  set  of  input-output 
pairs  is  a  common  problem  in  numerous  scientific  and  engineering  fields,  including 
pattern  recognition,  computer  vision,  and  applied  mathematics^^’^b  The  literature 
contains  several  methods  for  constructing  such  functions^^),  including  splines^^’^’®)  and 
neural  networks^^) ,  using  least-squares  estimation.  However,  splines,  kernel  estimation, 
and  related  methods  require  the  estimation  or  approximation  of  critical  parameters 
either  without  guidelines  or  at  computational  expense^®’®) .  Neural  networks  also  suffer 
from  this  problem  and  do  not  guarantee  convergence  to  an  optimal  solution^^’”^) .  As  a 
consequence,  function  approximation  is  an  area  of  ongoing  research. 

Genetic  algorithms  are  recently  developed  search  and  optimization  techniques  from  the 
field  of  artificial  intelligence.  They  have  been  shown  to  be  efficient,  robust,  and  pro¬ 
duce  near-optimal  solutions  to  problems  in  areas  such  as  pattern  recognition,  machine 
learning,  and  statistical  classification^®’®’^®’^^).  This  paper  proposes  the  use  of  genetic 
algorithms  and  least  squares  to  fit  piecewise  linear  functions  to  data  sets  in  where 
the  optimal  locations  of  the  knots  are  unknown.  We  first  present  the  problem  and 
discuss  current  methods  for  function  approximation.  We  then  introduce  genetic  al¬ 
gorithms  and  detail  how  GA’s  can  be  used  to  fit  optimal  piecewise-linear  functions. 
Several  examples  are  presented  with  results,  and  areas  for  future  research  are  men¬ 
tioned. 


2  Problem  Statement  and  Current  Methodology 

We  are  given  data  (x,  y),  x  =  (xi,X2,  ...,xu),y  =(?/i,  2/2,  (xi,  Pi)  Vi  = 

1  <  N  <  00.  Define  a;(i)  =  min{a;i;  i  =  l,...,Ar},  a:(jv)  =max{xi;  i  = 
The  values  Xi  and  Pi  are  related  by  an  unknown  function  /  such  that 
Pi  —  f{xi)  +  Ci,  where  Cj  is  a  random  error.  The  problem  is  to  approximate  the 
function  /  by  a  A:-piecewise  linear  function  /,  k  known,  where  the  knot  locations  z 
=(zi, . . . ,  Zfc)  are  unknown  and  the  least-squares  error  is  to  be  minimized.  We  make 
no  assumptions  regarding  the  smoothness  of  /  or  the  distributions  of  (x,  y)  and  e. 

Classical  approximation  theory  suggests  several  methods  for  solving  such  a  prob¬ 
lem,  methods  which  involve  building  models  from  linear  combinations  of  nonlinear 
functions^^^) .  Such  linear  estimators  can  be  expressed  as 

fi^i)  =  (1) 

X=1 

where  Kx{x,  Xi)  is  a  weighting  function  which  depends  on  some  parameter (s)  A^®).  Some 
examples  includes  kernel  estimates,  series  approximation  (which  we  will  not  explicitly 
discuss),  and  spline  fitting^^’^’®’®),  as  well  as  the  more  recent  neural  network  and  radial 
basis  function  estimates(^^’^®d4) 
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Kernel  estimators  can  be  expressed  in  the  above  form  where  the  weighting  function  or 
kernel  K\  has  a  simple  form  independent  of  the  design  of  x.  Examples  of  such  weighting 
functions  are  the  uniform  and  triangular  kernels.  K\  is  a  bounded  function  assumed  to 
have  support  [-1,1],  with  a  maximum  at  zero,  and  is  usually  chosen  to  satisfy  certain 
moment  conditions'®^  The  choice  of  moment  conditions  determines  the  order  m  of  the 
kernel  estimate:  conditions  on  higher  order  moments  lead  to  higher  order  estimates. 
The  parameter  A  is  called  the  bandwidth  and  determines  (1)  the  maximum  distance 
away  from  Xi  a  data  point  can  be  and  still  be  included  in  the  estimation  of  /(xj,  and 
(2)  the  amount  of  emphasis  placed  on  observations  at  certain  distances  from  Xj.  Note 
that  kernel  methods  require  the  researcher  to  select  the  appropriate  values  for  Kx,  X, 
and  771.  Although  these  choices  are  critical  to  the  quality  of  the  resulting  estimate, 
they  are  often  made  by  trial-and-error  or  time  consuming  adaptive  methods^-^®).  A  is 
often  chosen  using  cross-validation,  although  CV  does  not  guarantee  the  selection  of 
the  optimal  A^®^. 

Another  class  of  linear  estimators  closely  related  to  kernel  methods  are  splines.  A 
spline  of  order  L  with  knots  a.t  zi,...,zk  is  a  function  s  of  the  form 

s(3:)  =  Y,  +  YlSi{x-  Zi)^^-'^'^  (2) 

i=0  i~l 

ioT  di  G.  TZ,  i  —  0, . . .  ,L  —  1,  and  6i  G  It,  i  =  1, . . .  ,K.  In  other  words,  a  spline  is  a 

piecewise  polynomial  where  the  pieces  are  tied  at  knots  in  such  a  way  that  s  satisfies 

certain  continuity  properties  (e.g.,  the  first  L  —  1  derivatives  are  continuous).  As 
such  they  can  be  viewed  as  an  extension  of  polynomial  regression^®).  Different  classes 
of  splines  can  be  formed  by  using  different  basis  functions,  e.g.,  B-splines,  periodic 
splines,  etc.^^).  Splines  are  useful  when  we  want  an  estimate  which  meets  a  fitness 
criterion  as  well  as  a  smoothness  criterion.  Hence  we  may  estimate  y  by  choosing  /  to 
minimize 

+  ^  /  (3) 

j=i 

for  A  >  0,  m  G  and  a  <  Xj  <  b  ^  j  =  1, . . . ,  A.  The  solution  /  is  called 
a  smoothing  spline  estimate,  and  A  is  the  smoothing  parameter.  A  determines  the 
tradeoff  between  goodness-of-fit  and  smoothness^^®).  Splines  have  applications  in  areas 
such  as  computer  tomography^®)  and  military  analysis^"^) . 

To  use  (smoothing)  splines  for  analysis,  the  order  L  of  the  spline,  the  number  and 
location  of  the  knots,  and  A  need  to  be  determined  (as  well  as  the  choice  of  ba¬ 
sis  and  the  smoothing  criterion).  Finding  a  good  estimate  for  A  is  computationally 
demanding^^®)  and  m  is  often  based  on  prior  information^®)  as  opposed  to  theoretical 
considerations.  Schwetlick  and  Schiitze^®)  describe  an  algorithm  which  optimizes  the' 
location  and  number  of  ’free’  knots  but  is  computationally  ’too  expensive’  and  involves 
the  approximation  of  various  parameters  whose  effects  on  the  final  estimate  are  un¬ 
known.  Larson^"*)  finds  a  closed  form  for  the  minimizing  abscissa  for  unknown  knot 
locations,  but  does  not  mention  the  optimization  of  the  number  of  knots. 

A  recent  development  in  functional  approximation  is  the  use  of  neural  networks  (NN) 
and  radial  basis  functions  (RBFs).  Multilayer  neural  networks  are  linear  (in  the  sense 
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of  (1))  function  approximators  of  the  form,  e.g., 

M 

f{xk,  W)  =  PjQjiajXk)  (4) 

j=i 

where  Pj,l  <  j  <  M  are  the  weights  connecting  M  hidden  units  to  the  output  unit, 
Oij,  1  <  i  <  M  are  weights  connecting  the  input  layer  unit  to  the  jth  hidden  layer  unit, 
and  the  g/s  are  the  hidden  layer  activation  functions^^^) .  W  is  the  matrix  of  network 
weights.  A  special  ca.se  is  based  on  radial  basis  functions  where  the  approximation  is 
produced  by  passing  each  Xi  through  a  set  of  basis  functions,  each  containing  a  RBF 
center,  multiplying  the  result  by  a  coefficient,  and  then  summing  the  results.  In  other 
words, 

M 

f{xk,  W)  =  Wo  +  X!  -  CjW/r)  (5) 

i=i 

where  (p  is  the  radial  basis  function,  {cj}  is  the  set  of  RBF  centers,  and  r  is  a  scale 
parameter.  Often  (p  corresponds  to  a  Gaussian  density^^’^^).  Note  that  a  radial  basis 
function  network  (RBFN)  is  essentially  a  kernel  method  for  regressions^).  NNs  are 
easily  programmed  and,  as  a  result,  have  become  an  almost  universal  optimization 
’crank’:  simply  toss  in  the  data,  add  any  number  of  parameters,  and  wait  for  gradient 
descent  to  produce  the  result.  NNs  do  require  numerous  adjustments,  supplied  by  an 
experienced  user,  and  do  have  a  tendency  to  overfit  or  overparameterize  the  dataS^’^^). 
They  may  also  get  stuck  in  local  minima,  unlike  GA’sS^^s)  Chen  and  JainS^^)  report 
that  backward  propagation  can  be  slow  and  sensitive  to  noise.  They  suggest  a  robust 
modification  whose  parameters  are  the  focus  of  further  study.  RBFNs  have  been 
shown  to  outperform  MLPs^^^)  even  though  the  choice  of  centers^^)  and  the  curse  of 
dimensionality  can  make  implementation  difficult.  It  should  also  be  noted  that  both 
NN  and  RBFN  results  lack  interpretability^^). 

Our  preliminary  studies  indicate  that  GA’s  may  represent  a  viable  alternative  to  the 
above  methods. 


3  Genetic  Algorithms 


Genetic  algorithms  are  stochastic  search  methods  which  provide  a  near  optimal  solution 
to  the  evaluation  function  of  an  optimization  problem(®>^’^°’^^’^®).  They  can  be  used  to 
search  complex,  multimodal  surfaces  via  steps  based  on  the  processes  of  natural  genetic 
systems.  They  are  designed  to  work  simultaneously  on  a  group  of  possible  solutions 
(parallelism)  which  helps  prevent  the  algorithm  from  getting  stuck  in  a  local  optimum. 
Their  effectiveness  has  been  shown  in  numerous  problem  solving  applications,  including 
scheduling,  classifier  systems,  and  pattern  recognition^’-’-). 

Each  possible  solution  is  encoded  as  a  string  or  chromosome;  a  set  of  such  chromosomes 
is  called  a  population.  An  evaluation  (fitness)  function  provides  a  mapping  from  the 
chromosome  space  to  the  solution  space.  GA’s  start  with  an  initial  population  of  a 
fixed  number  of  randomly  generated  strings.  At  each  iteration,  three  basic  operations 
-  selection,  crossover,  and  mutation  -  are  applied  over  the  current  population  to  yield 
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a  new  population  of  strings.  This  cycle  is  repeated  until  some  termination  criterion  is 
achieved,  at  which  time  the  best  string  achieved  is  generally  taken  as  the  solution  to 
the  optimization  problem. 


3.1  Example 

For  a  more  detailed  look  at  this  process,  we  will  detail  the  stages  of  a  GA  model,  the 
elitist  model.  Consider  the  problem  of  maximizing  a  function  f{x),  x  G  D,  where  D 
is  a  finite  set  and  /(x)  >  0  V  x  €  D.  Each  string  S,  built  from  members  of  a  finite 
alphabet  A  =  {ai, . . . ,  oia},  corresponds  to  a  value  x  in  D  and  may  be  written  as 

S  =  (70, 7i,  •  •  ■ ,  7l);  7i  e  a  V  i  =  0, . . . ,  L 

The  number  of  different  strings  that  are  possible  is  a^.  A  random  sample  of  size  M 
(even)  is  drawn  from  these  possible  strings  with  replacement  to  form  the  initial 
population,  Q.  The  evaluation  or  fitness  value  of  each  string  S  is  fit{S)  =  f{x)  where 
X  G  D  is  the  value  represented  by  S. 

The  first  operation,  selection,  is  modeled  after  Darwin’s  concept  of  ’survival  of  the 
fittest’.  Strings  from  the  population  are  selected  and  placed  in  a  mating  pool;  the 
probability  of  selection  for  string  j  is  bj  =  fit{Sj)/Y,^i  For  example,  if  Bj  = 

Yii=i  bk,  M  strings  are  selected  and  placed  in  the  mating  pool  by  the  following  process: 

1.  Generate  a  random  number  rndi  from  [0,1] 

2.  If  rndi  ^  Bi,  select  for  j  =  2, . . . ,  M,  if  Bj_i  <  rndi  <  Bj,  select  Sj 

Note  that  strings  with  low  fitness  values  are  rarely  selected  while  some  strings  may  be 
selected  more  than  once.  We  denote  the  mating  pool,  our  new  population,  as  Qi. 

In  single  point  crossover,  or  reproduction,  pairs  of  strings  exchange  information,  thereby 
generating  two  new  offspring  for  the  next  population.  All  strings  are  paired  at  random 
in  such  a  way  that  each  string  belongs  to  only  one  pair  (hence  there  are  M/2  pairs). 
Let  the  given  pair  be  denoted  as 

=  {Pi,  and  T  =  (ti, 

and  let  Pc  be  the  probability  that  a  given  pair  of  strings  undergoes  crossover.  Then 
the  crossover  operation  on  a  given  pair  may  be  described  as 

1.  Generate  a  random  number  rnd  from  [0,1] 

2.  If  rnd  <  Pc,  then  generate  a  random  integer  pos  from  [1,  L-1]. 

3.  Strings  /?  and  r  are  replaced  by  strings  /?'  and  r'  where 

P  {Pi,  •  •  • ,  Ppos,  A>o5+1)  •  •  • )  Ti/)  and  t  (^ii  •  •  • ,  'Tpos,  Ppos+i,  •  •  • ,  Pl) 


5 


The  resulting  population  is  denoted  Q2.  Note  that  Q2  has  M  strings,  some  of  which 
may  have  also  been  elements  of  . 

Mutation  involves  the  random  altering  of  characters  in  the  chromosomes  (strings)  of 
Qi-  Let  Pm  denote  the  probability  of  mutation  of  a  given  character.  Then,  for  each 
character  pi  of  every  string  the  mutation  stage  consists  of 

1.  Generate  a  random  number  rnd  from  [0,1] 

2.  If  rnd  <  Pm,  mutate  character  Pi  by  replacing  it  at  random  with  an  element  from 

A  -  {ft}. 


Note  that  through  mutation,  a  given  string  can  become  any  of  the  possible  strings. 
The  mutation  probability  may  vary  over  iterations,  initially  taking  a  high  value,  then 
decreasing  to  a  pre-specified  minimum,  then  increasing  again  in  the  later  stages  of  the 
algorithm.  When  the  algorithm  has  little  knowledge  of  the  search  space,  the  algorithm 
is  encouraged  to  explore  it’s  domain  through  a  high  mutation  probability.  As  the 
number  of  iterations  increases  the  algorithm  will  move  towards  a  solution,  hence  the 
mutation  probability  is  decreased  to  allow  a  search  of  the  vicinity  near  this  solution. 
To  avoid  the  convergence  of  the  algorithm  to  a  local  optima,  the  mutation  probability 
is  increased  in  the  later  stages  to  again  allow  for  a  more  random  search.  The  resulting 
population  we  denote  as  Q3. 

We  now  replace  our  initial  Q  with  Q3  and  repeat  the  above  stages  until  the  algorithm 
converges  to  a  satisfactory  solution.  The  stages  we  have  discussed  so  far  are  common 
to  all  GA  models.  In  the  elitist  model  of  GA’s  (EGA),  a  further  operation,  elitism,  is 
added  to  ensure  that  knowledge  about  the  best  string  obtained  so  far  is  preserved.  In 
this  way  the  algorithm  can  report  at  any  time  the  best  solution  achieved  during  the 
entire  process.  The  basic  steps  of  the  elitist  model  are 

1.  Generate  an  initial  population  Q  and  find  the  fitness  values  of  each  string  S  in 

<5. 

2.  Find  the  string  SmaxQ  in  Q  with  the  maximum  fitness  value  fitmaxQ  of  all  of  the 
strings  in  Q 

3.  Perform  selection  on  Q  yielding  Qi 

4.  Perform  crossover  on  Qi  yielding  Q2 

5.  Perform  mutation  on  <52  yielding  Q3 

6.  {elitism)  Compare  the  fitness  value  of  each  string  in  Q3  with  the  fitness  value  of 
SmaxQ-  If  no  String  in  Q3  has  a  fitness  value  greater  than  or  equal  to  fitmaxQ, 
replace  the  worst  string  in  Q3  with  SmaxQ- 

7.  Replace  Q  with  Q3  and  go  to  step  2. 
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3.2 


Remarks 


3.2.1  Stopping  Rules  and  Convergence 

With  any  optimization  technique,  it  is  important  to  ensure  that  the  process  will  lead  to 
the  optimal  solution.  It  has  been  theoretically  proven^^®^  that  elitist  genetic  algorithms 
will  converge  to  the  optimal  solution  as  the  number  of  iterations,  n,  goes  to  infinity. 
However,  in  practice,  n  is  finite  so  a  stopping  rule  is  used  to  determine  when  the 
algorithm  has  reached  an  acceptable  solution.  There  is,  in  general,  no  stopping  rule  in 
the  literature  which  will  ensure  the  convergence  of  GA’s  to  the  optimal  solution.  Two 
common  stopping  rules  are 


•  Execute  the  process  for  a  fixed  number  of  iterations  and  report  the  best  string 
found  as  the  solution. 

•  Execute  the  process  until  the  fitness  value  does  not  show  adequate  improvement 
over  a  fixed  number  of  iterations,  and  report  the  best  string  found  as  the  solution. 


The  rate  of  convergence  of  GA’s  depends  on  M,  Pc,  and  pm-  Hence  the  values  of  these 
parameters  must  be  chosen  properly.  Note,  however,  that  the  proof  of  convergence  to 
the  optimal  solution  does  not  depend  on  the  parameter  values,  i.e.,  the  GA  will  converge 
to  the  optimum  as  n  goes  to  infinity  regardless  of  the  parameter  values  chosen. 


3.2.2  Pattern  Classification 

Recently,  several  applications  of  genetic  algorithms  in  the  field  of  pattern  classification 
have  been  reported^®’^®’^^^.  Classification  is  the  problem  of  finding  a  decision  boundary 
that  can  correctly  distinguish  between  different  classes  in  the  feature  space.  Given  a 
set  of  data  points  in  71^,  N  >  I  ,  genetic  algorithms  can  be  used  to  perform  this 
task  by,  for  example,  allowing  each  string  to  represent  a  decision  boundary  formed 
by  a  set  of  lines  or  hyperplanes.  A  fitness  function  which  takes  larger  values  for 
smaller  numbers  of  misclassifications  is  then  maximized.  Usually  the  optimal  decision 
boundary  is  nonlinear  so  our  task  is  to  approximate  the  optimal  boundary  with  a  set 
of  linear  segments.  The  algorithm  is  run  until  a  decision  boundary  with  an  acceptable 
number  of  misclassifications  is  found. 

The  application  of  GA’s  for  classification  is  similar  to  the  application  discussed  in 
this  paper.  Here,  each  string  also  represents  a  set  of  lines  and  the  string  which  best 
approximates  the  optimal  solution  is  reported  as  the  result.  Our  interest,  however,  is. 
focused  on  finding  a  piecewise  linear  function  which  will  minimize  the  squared  distance 
of  the  data  points  from  the  function,  and  not  on  dividing  the  data  into  distinct  classes. 
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4  Theory  of  Line  Fitting  in  'R? 


4.1  Mathematical  Formulation 


Let  (a;,  y)  be  the  given  data  set,  a;  =  (xi,  a:2, . . . ,  y  =(yi,  2/2,  •  •  • ,  Vn),  (x*,  yi)  € 
7?.^  V  i  =  1,. . AT,  1  <  N  <  00.  Define  X(i)  =  min{xt,  i  =  X(^-)  = 

max{xi,  i  =  1, . . . ,  AT},  =  min{yi,  i  =  1, . . . ,  A^},  2/(iv)  =max{2/j,  i  =  1, . . . ,  AT}. 
Let  Cko  represent  the  class  of  all  A:o-piecewise  linear  functions  in  'R?  that  can 

be  expressed  in  the  following  form: 


^(jl)  ^  ^(i2) 

^02)  ^ 

^  ^jkQkoi^i)  ^O'fco)  ^  ^  ^0(*o+l)) 


where  X(yi)  <  X(j2)  <  •  •  •  <  x^ko)  <  ,  a:(ji)  =  X(i),  a:(j(jfc(,+i))  =  x^n),  and  each 

Ljifco,  i  =  1, . . . ,  /co,  can  be  expressed  as 


X  cos  9 ji  +  y  sin  6 ji  =  dji,  0  <  <  tt,  dji  eH 

where  6ji  (0  <  6ji  <  tt)  is  the  polar  angle  formed  when  the  polar  axis  is  the  y-axis 
and  the  origin  is  the  intersection  point  between  the  y-axis  and  Lj^ko,  and  dji  is  the 
perpendicular  distance  of  the  line  from  the  origin  (0,0).  The  number  of  elements 
Ljko  €  J^ko  is  uncountable.  However,  we  can  restrict  the  class  of  functions  under 
consideration  to  a  finite  (discrete)  set  by  restricting  the  values  of  9  and  d.  Let  h  be 
the  number  of  bits  used  to  express  9  and  let  Id  be  the  number  of  bits  used  to  express 
d.  Note  that  the  precision  of  the  line  is  determined  by  both  U  and  la-  We  restrict  9ji 

to  the  values  {0,  . . . ,  in  specifying  dji,  we  utilize  the  rectangle  red 

formed  by  the  points  (x(i),y(i)),  {x^N),y{i)),  (x(i),2/(;v))  and  (x(^),y(jv)).  Note  that 
red  contains  the  entire  data  set.  Let  diag  be  the  maximum  diagonal  of  red  and  let  Ig 
be  defined  as 

/  =/  2:(i)  cos  0 -1- 2/(1)  cos  ^  ifO<0<7r/2  ,  . 

®  \  x^pf)  cos  9  +  2/(1)  cos  0  if  7r/2  <  0  <  TT  ^ '  7 

Then  for  a  given  9ji,  dji  may  only  take  values  within  the  set  {dji  =  lg..+  kjiS  :  kji  6 
{0, 1, . . . ,  2*‘>  —  1},  5  =  diag/{2^^  —  1)}.  Let  denote  the  finite  set  of  functions  in 
CkQ  which  satisfy  these  restrictions.  Then  may  be  expressed  as 

•^fco  “  •  ^jkoip)  ^  ^koi 

Lj^ko  is  of  the  form  x  cos  9ji  +  y  sin  9ji  =  Ig.^  -h  kji5  V  i  =  1, . . . ,  fco, 

%  ^  {0)  1^!  •  •  •  1  kji  G  {0, 1, . . . ,  2*‘>  - 1},  and  5  =  diag/ {2^^  —  1)}. 


Note  1  A  line  with  d  —  Ig  intersects  red  at  the  point  (ai(i),  2/(i)),  if  0  <  0  <  7r/2,  or 
the  point  (x(iv),2/(i)),  if  7r/2  <  9  <  tt.  The  parameter  kjiS,  0  <  kjiS  <  diag,  is 
sometimes  referred  to  as  the  offset  value. 
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Note  2  For  fixed  i  and  9ji  the  lines  Lj^ko,  fcji  =  0, . . . ,  2'*^  —  1,  are  parallel  and  evenly 
spaced  across  the  area  covered  by  red. 

Note  3  If  la  >  la  and  IJ  >  la,  then  C  where  corresponds  to  la  and  la  and 
£2*  corresponds  to  1*  and  1^. 


Figure  1  represents  a  sample  data  set  with  several  functions  from  For  sake  of 
clarity,  we  will  henceforth  specify  as  £°q(0,/C)  where  6  has  0  possible  values 
and  k  has  /C  possible  values  (note  that  0  =  2*^  and  /C  =  2*'’)  and  specify  Ljko(x)  as 
L(0jko,  kjko).  Our  goal  is  to  use  genetic  algorithms  to  find  the  minimum  least-squares 
fco-piecewise  linear  function  where  ko  is  known.  This  is  possible  if  and  only  if 


1.  Our  search  space  Cl^{Q,JC)  contains  the  optimal  solution,  i.e.,  minimum  least- 
squares  function. 

2.  The  algorithm  converges  to  this  optimal  solution 


as  0  — >■  cxD  and  /C  ->  oo.  We  first  determine  whether  these  conditions  are  met  when 
ko  =  l. 
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4.2  Case  ko  =  1 


In  the  case  where  /jq  =  1,  we  would  like  our  optimal  string  to  represent  the  minimiim 
least-squares  line,  i.e.,  the  line  whose  fitted  values  yo  satisfy 

N  N  ^  ^ 

(E(?/oi  -  yi?)~^  =  kji)  |x,  Lji  €  £i}  (8) 

1=1  ^  1=1 


The  least-squares  line  is  known  as  y  =  ydix  -|-  Pq,  where 


Pi  = 


N- 


-  x){yi  -  y) 


N  N 

y=MT,y‘ 


pQ  =  y-  Pix 


X 


N 


i=l 


Ni 


i=l 


[?]  Note  that  the  least-squares  line  intersects  rect  since  it  passes  through  the  point 

{^,y)- 

Let 


£5(0, /C)  =  {L{9jiykji)  :  L{dji.kji)  €  £i,  L{dji,kji)  is  of  the  form 
aicos^ji  +  ysinOji  =  Ig.^  +  kji5,  where 
6ji  is  one  of  0  values,  kji  is  one  of  K  values} 


and  let 


Bi  =  {L(dml,km\)  ■  L{9mukml)  6  £i,  3  {Xr,yr)  €  TtCt  Satisfying  L(^ml,  ^ml), 

0  ^  9ffi\  <c  TT ,  kfji\  G  ^}  ■ 

Figures  2  and  3  show  lines  from  C\{Q,K)  for  a  sample  data  set.  We  shall  prove  that 
our  class  C\{Q,K)  will  contain  the  least-squares  line  as  0  — >  oo  and  )C  oo. 

For  simplicity,  let  p  =  9  +  k5  for  given  9  and  k. 


Proposition  4.1  Let  L{9mi,kmi)  e  Bi.  Let  e  >  0.  Then  3  {Qe,K,e)  :  V  0  > 
0,  and  K  >  K,,,  3L{9,k): 


1.  Li9,k)  eCl{e,)C) 

2.  \  9  -  9mi  \<  e/2  and  |  p^i  -  P  |<  e/2 


Proof:  Let  L{9mi,km\)  £  Bi  and  e  >  0  be  given.  Choose  0^  :  71/2**^  =  7r/0e  <  e/2.- 
Similarly,  choose  ■.  5  =  dzap/(2L  -  1)  =  diag/{)Ce  -  1)  <  e/2.  Then  3  L{9,k)  € 
£?(0e,^e)  :  2.  is  satisfied.  By  Note  3  above,  if  L{9,k)  G  £?(0£,/Ce),  then  L{9,k)  G 
£5(0, /C)  V  0  >  0e  and  V  /C  >  Hence  1.  is  satisfied. 


Proposition  4.2  For  each  e  >  0,  3  (0^,  Kf)  :  for  all  0  >  0^  and  for  all  /C  > 
given  any  L{9mukmi)  €  Bi,  3  L{9,k)  : 


1.  L{9,k')  EjOi{Q,JC) 

S.  \  0  -  Bm\  |<  e/2  and  \prn\-  p  |<  e/2 

Prooj:  Let  e  >  0  be  given.  Choose  0^  :  7r/2'"  =  Tr/e^  <  e/2.  Then  for  g 

{0. •  ■  • )  <  ^e(.)  Vi,  i  =  1, . . . , 0J,  we  have  1 0  -  |<  e/2, 

I  ^"(1)  “  ^^(2)  l<  •  •  •  >  I  ^e(e,)  -  TT  |<  7r/2.  So  given  any  Aimi)  €  we  can 

choose  0,  so  that  3  €  {0, ... ,  :\9mi-  9,^,  |<  e/2. 

For  any  angle  9,^  G  [f ,  ^^],  n  =  0, . . . ,  0,  -  2  ,  the  corresponding  g 
[Teni.TtnJ,  <  Teni  <  Tcnj  ^  ■  Find 

u  =  max  {sup  {[7,„^  -  7,„J,  n  =  0, . . . ,  0,  -  2}} 
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Figure  3:  A  data  set  with  lines  from  £5(0, /C),  dji  <  7r/2 


Choose  :  Z/'/ZCg  <  e/4  and  =  2®'  for  some  y  GTZ.  Then  given  any  L(9mi,kmi)  € 
Bi  we  can  choose  /Ce  so  that  3  G  {0, . . . ,  2^'  -  1}  :  |  —  p^i  \<  t/2. 

Hence  given  any  e  >  0  and  L{6mi,kmi)  €  Bi  we  can  find  0^  and  /C^  so  that 
3  Liberal .  k^mi  ^  ^e)  ^  -  Omi  |<  e/2  and  |  pmi  -  P  |<  e/2. 

If  ^  >C?(0e,/Ce),  then  G  £?(0,X:)  V  0  >  0,  and  V  /C  >  X:^ 

by  Note  3  above.  Hence  1.  and  2.  are  satisfied.^ 

Let  {0i,  i  =  1, 2, . . .}  and  {Ki,  z  =  1, 2, . . .}  represent  the  possible  values  of  0  and  K,. 
Let  k!^-^)  be  the  best  line  in  Bi  and  let  be  the  best  line  in  £?(0j,X^i).. 

We  would  like  9^^  9^^  and  X-j  ->■  k^^  (hence  p-i  p^^)  as  z  ->  oo.  To  show  this, 

we  need  one  final  result. 

Define  Ci  =  \JZi  £?(©*,  X:,).  Note  Bx  C  Ci. 

Theorem  4.1  For  each  z  =  1, 2, . . . ,  let  L{9ni,kn^)  G  £5(0i, K-i)  : 

Om  Oiim  and  k^  kum  for  some  Bum,  0  <  Bum  <  tt,  and  kum,  0  <  kum  <  oo.  Let 
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Vi  =  {L{6iim,kiim)  :  3  a  sequence  e  £?(0i,/Ci) 

such  that  ^n.  ^  Oiim  and  A:„.  ->  kum} 

Then  the  best  line  in  Bi  is  the  best  line  in  Vi. 


Proof:  Note  that  Ci  C  Vi  and  Bi  CVi.  Since  the  minimal  least  squares  line  must 
pass  through  {x,  y)  and  {x,  y)  6  rect,  the  best  line  in  Bi  =  the  best  line  in  Vi.4(t 

For  the  following  claim,  we  assume  the  optimal  line  (i.e.,  the  line  which  maximizes  the 
fitness  function  or,  in  our  case,  minimizes  the  least  squares  error)  is  unique  and  that 
thex  in  D.  fitness  function  /  :  [0,  tt]  x  [— M,  M]  is  continuous  where,  for  any  line  in 
Vi,  the  distance  of  the  line  from  the  origin  is  less  than  M. 


Proposition  4.3  Lei  L{e*^k*^f)  be  the  best  line  in  Vi  and  for  each  i,  i  =  1, 2, . . . ,  let 
be  the  best  line  in  Then  9*^  9*^^  and  k^^  k*^^  as  i  ->  oo.’ 


then  ^  /(^ml  j  ^ml)* 


^mi))  <  l/jJ  where  ji  is  chosen  so  that 
if  [Bii^kii)  G  Ei  and  (9ji/kji)  G  Ef  then  /{Oix^kn)  >  and  ^  oo  as 

z  — >  oo.  Such  sets  Ei  exist  since  the  optimum  is  unique*. 


Note  that  for  each  i,  3j,  >  0  :  £  £;(e,„/C„),  |  %  -  \<  e,/2, 

I  Pil  Pmi  ^  The  best  line  in  6  Ei,  and  the 

best  line  in  £?(0,  /C),  V  0  >  0£.,  V  /C  >  )C^.,  will  also  be  in  Ei  (the  sets  Ei,  i  =  l,2,..., 
are  nested  sets).  ’ 


Let  the  best  line  in  £?(0,„;Ce,)  be  L{9*^,kl).  Note  that  9^  9*^,  and  fc*  k*^^ 

as  i  oo.  That  is,  if  L{9*,k*)  is  the  best  line  in  E?(0,k),  then  9*  ->■  '9* ,  and 
A:*  ^  A:;bi  as  0  ^  00  and  a:  ^  oo.dlk 

In  the  above  proof  it  was  stated  that  the  sets  Ei,  i  =  1,2,...,  exist  because  the 
optimum  is  assumed  to  be  unique.  We  now  prove  this. 


Proposition  4  4  Define  :  d((«„,.  t„,),  (Ci,  <  I/;,}  where  k  is  chosen 

so  that  if  (ei,,ka)  6  Ei  and  (hi.kHi)  £  Bf  then  Hetukn)  >  /(tffti.tji),  and  ji  -» 
oo  as  z  ^  oo.  Assume  that  /  :  [0, 7r]x[— M,  M]  is  continuous  and  has  a  unique  maximum. 

Then  such  sets  Ei  exist. 


We^  prove  this  by  contradiction.  For  each  i,  Ei  constitutes  an  open  disk  con- 
?  From  topology(^^)  we  know  that  if  A  is  any  open  set  containing; 

'ml)  then  f{A)  C  [f{9,^^,  k^f)  —  e,  f{9^^,  k^i)  +  e]  for  some  e  >  0. 

So  suppose  not. 
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Then  for  all  open  sets  A  containing  if  (^a,  ka)  €  A  then  /{da,  ka)  <  f{ei  k^) 

for  some  point  (9^,  k^)  ^  A.  But  f{Ei)  -)■  f{9*^i,  as  i  ->  oo,  where 

e  T>i,  m  =  l,2,...} 

and  is  unique.  Contradiction,  in 


4.2.1  Remarks 

1.  If  we  choose  both  0*0  and  Ki^  to  be  large,  so  that  |  9i  -  9i^i  \  and  |  ki  -  ki^i  \ 
are  both  small,  then  the  maximal  line  L{91^,  will  be  close  to  the  optimal  line 

2.  In  developing  our  genetic  algorithm,  it  seems  logical  to  start  with  an  initial  choice 

for  (Qig,/Cig)  and  run  the  algorithm  for  a  finite  number  of  iterations,  resulting  in 
an  approximation  of  If  0*o  and/or  Kig  are/is  small,  it 

is  possible  for  L{Q*^,)C^^)  to  be  close  to  L(0|p,X:^J  in  terms  of  probability  but 
not  close  to  L{Q*^,}C*^)  or  L{9^^,k^i)  in  terms  of  Euclidean  distance.  Since  it  is 
unknown  whether  given  values  for  0  and  K  are  ’small’  or  ’large’,'  we  will  start  by 
searching,  given  (0*^, /C*,,),  for  a  L(0|“, K.^^)  that  is  close  to  ^(0?^^,  in  terms 
of  probability,  and  then  choose  subsequent  (0*,  Ki)  so  that  our  approximations 
L(0*“,X:t“)  move  closer  to  L{9);^i,  k^^^)  in  terms  of  Euclidean  distance. 


We  now  would  like  to  extend  this  theory  to  the  case  where  the  number  of  lines  ko  >  1, 
ko  known. 


4.3  Case  ko  =  no,  no  known,  no  >  1 

We  consider  using  genetic  algorithms  to  fit  the  minimum  least-squares  A:o-piecewise 
linear  function  to  the  data  set  {x,y)  where  ko  =  no,  no  known.  The  fitted  values  yom 
of  the  optimal  function  satisfy 

yj  ~  kjno))  ^{9jnof^jno)  ^  •^nol  (f®) 

where 

^{9jnoi^jno)  |a:=  L{9j^ng,  ^jino)  |i  fo^  ^  X  <  Xr<fj^g-^  = 

^^>(2no-i)  =  +  1  for  i  =  1, . . . ,  2(no  -  1). 

Note  that  =  {xNj^g-,,XN.^^^, . .  ■  depends  on  the  function  L{9jng,'kjno)- 

Our  search  space  is 


~  {-^^>10(2!)  :  Ljng{x)  G  Cjio, 

Lj^no  is  of  the  form  x  cos Oji  +  y  sin  Oji  —  Ig..  +  kjiS  V  i  =  1, . . . ,  no, 

%  ^  {0)  •  •  • )  ^  {0)  1)  •  •  • )  2'*^  “  1})  6  =  dio^/ (2''^  —  1)}. 

We  will  only  consider  those  Ljno{x)  G  C^o  '■ 

1.  —  cos  ^j(i)/  sin  0j(i)  /  —  cos  sin  %(,■+!)  V  i  =  1, . . . ,  no  (no  adjacent  parallel 

lines). 

2.  Let  zyi), . . .  ,2;(j(„o_i))  be  the  intersection  points  of  Z/j„g(x).  Then  < 

%i)  <  2;Ar.(2i)  for  i  =  1, . . . ,  (no  -  1). 

As  before,  let  Ljnoix)  be  denoted  as  L{6jnQ,  fcjno)- 


The  theory  for  the  case  ko  =  1  can  be  extended  to  this  case.  Each  string  can  be 
designed  to  represent  an  no-piecewise  function  L{6jno,  kjno)  €  £n  satisfying  the  above 
assumptions;  note  that  each  string  will  resemble  a  combination  of  no  strings  from  the 
ko  ~  1  case.  For  example,  if  no  =  3,  la  =  3,  and  1^  =  5,  then  a  string  may  look  like 


distance  1 


distance  2  distance  3 


001100110100010101101101 
angle  1  angle  2  angle  3 


We  then  employ  a  similar  GA  optimization  procedure  to  find  the  string  which  repre¬ 
sents  the  minimal  least-squares  no-piecewise  function. 

The  optimization  procedure  can  alternatively  be  viewed  as  a  two-step  process:  for 
each  possible  choice  of  x^v^  ,  say,  find  the  optimal  choice  for  T(0j„Q,  say, 

kjno)-  Then,  from  the  set  of  all  functions  {L^'{djno,kjno)}i>i,  select  the  op¬ 
timal  function,  say,  L^{djno,kjno)-  If  we  let  be  the  set  of  possible  values  for 

xpf.  and  let  {L{6jno-,kjno))}i  denote  the  set  of  all  no-piecewise  functions  whose  pieces 
intersect  in  such  a  way  that  xjy^  satisfies  the  above,  then 

f  {^L^(^0jno,  kjno))  ~  kjno))]  jno>  kjno)  ^  ^jrio)}i}} 

I 

(11) 
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4.4  Further  Remarks  and  Discussion 


4.4.1  Fitness  Function 

In  equation  (10)  the  fitness  function  for  our  genetic  algorithm  was  stated  as 

no  ^oi-1 

(£  S  (yom  -  Vrafy^ 

i=l  m=No(4-i) 

If  all  of  the  data  points  fall  on  a  line  (or  on  several  linear  segments),  however,  it  is 
possible  for  Em=iVo(i_i)  (yom  -  Pm)^  to  equal  zero.  To  avoid  this  case  the  above 
fitness  function  may  be  modified  by  the  addition  of  an  arbitrary  positive  constant  e, 
yielding 

TlO  ^Oi  —  1 

((£  E  iy0m-ym?)+ey 

i=l  m=Vo(i_i) 


4.4.2  Assumptions 

In  the  above  theory,  we  have  assumed  that  the  optimal  number  of  lines  is  known. 
However,  in  most  cases,  the  optimal  number  of  lines  is  unknown  and  must  be  estimated. 
It  may  be  possible  to  generalize  the  above  theory  to  this  case  by  utilizing  a  genetic 
algorithm  which  allows  for  variable  string  lengths.  Then,  given  a  data  set,  the  algorithm 
could  select  the  optimal  number  of  pieces  (from  an  initial  set  of  possible  values)  as  well 
as  the  optimal  piecewise  function. 


4.4.3  Curve  Fitting 

In  this  paper  we  have  only  considered  the  fitting  of  piecewise  linear  functions.  It  is 
well  known  that  piecewise  linear  functions  can  be  used  to  approximate  a  curve  to  any 
degree  of  accuracy.  Hence  curve  fitting  can  be  seen  as  a  generalization  of  the  above 
problem.  Suppose  our  interest  was  in  fitting  the  optimal  curve  to  a  data  set.  An 
approach  to  this  problem  may  be  to  apply  the  above  theory,  given  a  set  of  points,  to 
find  the  piecewise  linear  function  which  best  approximates  the  optimal  curve.  The 
quality  of  the  approximation  would  be  influenced  by  the  number  of  pieces  as  well  as 
the  number  of  iterations. 


4.4.4  More  than  2  Dimensions 

In  two  dimensions,  our  interest  is  in  fitting  a  ko-piecewise  linear  function  to  a  data  set 
{(3^1,  yi),  •  •  ■  j  (^i\r>yv)}-  A  similar  problem  exists  for  data  in  do  dimensions,  do  >  2; 
namely,  fitting  a  A:o-piecewise  hyperplane  to  a  data  set  {(xi,  yi), . . . ,  (xjv,  Vn)},  where 
Xi  =  (xii, . . .  ,Xicio)  hr  i  =  1, ...  ,N.  To  solve  this  problem  using  genetic  algorithms, 
we  could  consider  extending  the  methods  outlined  above  as  follows: 
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Each  string  would  represent  an  individual  solution  to  the  problem,  i.e.,  a  fco-piecewise 
hyperplane  in  do  dimensions.  From  geometry  we  know  that  a  hyperplane  in  71‘^  may 
be  represented  as 

XiN  COS  9m-i  +  7JV-1  sin  =  c 

where 


•  {xii, ,  Xi^)  is  a  point  on  the  hyperplane 

•  jN-k  =  Xi{N-k)  cos  O^-^k+l)  +  7N-(k+l)  Sin  iOT  k  =  1,  ...,  N  —  1 

•  6N-k  is  the  angle  that  the  projection  of  the  normal  to  the  (Xi - 

plane  makes  with  the  axis  for  k  =  2, ..  .  ,N  —  1 

•  0jv-i  is  the  angle  that  the  projection  of  the  normal  to  the  hyperplane  makes  with 
the  Xn  axis 

•  6o  is  the  angle  that  the  projection  of  the  normal  to  the  Xi  plane  makes  with  the 
Xi  axis  (^0  =  0),  and 

•  c  is  the  perpendicular  distance  of  the  hyperplane  from  the  origin. 


To  specify  c  we  use  the  hyper-rectangle  hrect  containing  the  points  {(xi,  yi), . . . ,  (x;v,  yv)}? 
just  as  we  used  the  rectangle  containing  the  points  (x,y)  to  specify  d  in  'R?.  Let  la  be 
the  number  of  bits  used  to  specify  an  angle,  and  Id  be  the  number  of  bits  used  to  specify 
c.  Let  Hji  represent  the  ith  hyperplane,  or  piece,  of  the  jth.  /lo-piecewise  hyperplane. 

Then  for  a  given  set  of  angles  Qji  =  {9 jig,. . . , 9ji^  e  {0, . . . ,  V  m  = 

0, . . . ,  do  —  1,  we  let  c  =  Iq.^  +kji6  where  Z©^..  is  the  minimum  distance  of  the  origin  from 
one  of  the  hyperplanes  passing  through  a  vertex  of  hrect,  diag  be  the  maximum  diago¬ 
nal  of  hrect,  kji  €  {0, 1, ... ,  2*'^  — 1},  and  5  =  diag/{2^<^  —  l).  Let  =  {jjii, . . . 

Then  our  discrete  search  space  like  may  be  written  as 

^ko  =  {Hjkoi'^i,  •  •  •  ,Xiv)  :  . . .  ,Xiv)  €  Hko,  is  of  the  form 

Xipf  COS  sin  9 kjiS  V  i  1, . , , ,  ko, 

Tji  and  Qji  are  as  specified  above,  9ji^  e  {0, . . . , 

V  m  =  0, . . . ,  do  —  1,  kji  e  {0,1, ... ,  2*''  —  1},  and  5  —  diag/ (2^'^  —  1)} 

We  now  use  GA’s  to  search  'H'/q  for  the  /co-piecewise  hyperplane  which  minimizes  the 
least-squares  distance  of  the  points  (xj,  yi)  from  the  hyperplane. 
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5 


Implementation  and  Results 


5.1 

Case  ko  =  1 

5.1.1 

Data 

The  data  set  used  is  from  Weisberg’s  text,  Applied  Linear  Regression^^^\  The  set 
contains  17  data  points  that  were  collected  in  an  experiment  by  James  D.  Forbes,  a 
Scottish  physicist,  designed  to  study  the  relationship  between  atmospheric  pressure  (in 
Hg.)  and  boiling  point  (F°). 


5.1.2  Genetic  Algorithm 

We  used  a  fixed  population  size  of  M  =  10  and  a  string  length  of  L  =  20,  with  8 
bits  representing  9  and  12  bits  representing  k\  note  that  once  Uheta  and  5  are  known, 
specifying  k  is  equivalent  to  specifying  d.  The  single-point  crossover  probability,  p,  was 
fixed  at  0.8.  The  mutation  probability  q  varied  with  the  iteration  number  over  a  range 
of  [0.0015,  0.5],  either  increasing  or  decreasing  depending  on  the  value  of  Nit/Nmax, 
where  Nit  is  the  current  iteration  number  and  Nmax  is  the  maximum  number  of 
iterations.  Nmax  was  set  at  1500,  at  which  time  the  maximum  fitness  value  attained 
and  it’s  corresponding  string  were  reported.  As  stated  previously,  for  a  given  string  Sj 
and  the  fitted  values  pji  of  the  line  it  represents,  the  fitness  value  is  given  as 

fifi)  =  (12) 

The  results  of  the  GA  were  compared  to  the  results  of  a  simple  linear  regression  program 
designed  to  fit  the  least  squares  line  to  the  data. 


5.1.3  Experimental  Results 

The  proposed  algorithm  was  tested  on  the  data  described  in  Section  5.1.1.  The  results 
are  shown  in  Table  1  and  Figure  4.  The  results  of  the  GA  are  comparable  to  the  results 
of  the  least-squares  regression  program.  The  disparity  between  the  results  of  the  two 
methods  may  be  the  result  of,  for  example,  the  algorithm  failing  to  converge  (due  to 
an  insufficient  value  for  Nmax)  or  lack  of  precision  in  the  results  of  the  GA  (due  to 
insufficient  string  length). 


Nit 

function 

maxj/(yj) 

Approx 

1500 

fix)  =  0.882a:  -  39.32 

0.450 

Actual 

fix)  =  0.895a;  -  42.14 

0.464 

Table  1:  Results  of  Experiment  1 
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least  squares  line 
GA  line 


195  200  205  210 

boiling  point 

Figure  4:  Results  of  Experiment  1 

5.2  Case  ko  =  no,  no  known,  no  >  1 

We  will  demonstrate  this  case  when  ko  =  S. 


5.2.1  Data 

An  artificial  data  set  was  created  by  first  selecting  a  3-piecewise  linear  generating 
function  g{x)  whose  value  is  given  by 

(  2  + X  — l<a;<0 

g{x)  =<  2  —  X  0  <  X  <  1  (13) 

[  rc  1  <  X  <  2 

where  x  =  (—1,  —0.96,  —0.92, . . . ,  1.96, 2).  For  each  value  Xi  G  x,  the  corresponding 
vector  Yi  =  {yn, . . . ,  consists  of  5  values  randomly  generated  from  a  Normal  {g{xi),  0.1) 
distribution. 


5.2.2  Genetic  Algorithm 

Each  string  or  chromosome  represents  a  3-piecewise  linear  function 

hjs)  ^is)) -^(^j23)  ^^23)) 

over  the  range  of  x.  Let  Z(ji)  be  the  intersection  point  of  Lj^z  and  Lj^z  and  let  Z(^j2) 
be  the  intersection  point  of  Lj^z  and  Lj^z-  We  chose  to  consider  only  those  piecewise 
functions  Ljz  for  which 

•  2(^1)  and  ZQ2)  exist  (no  adjacent  parallel  pieces) 

•  3:(i)  <  Z(ji)  <  Z[j2)  <  X(N) 


The  fitness  value  for  a  given  string  is  then 

(E  E  (yjm  -  ymf)~^  (14) 

2=1  Tn-Nj(^2(i-l)) 

where  {Njo, Njz)  : 

•  xnjo  =  2:(i),  =  a:(Ar) 

*  XNji  ^  —  ^Nji+1  Xj\fj2>  ^  ^  ^iVj3+l  ^iVj4 


and 


•^(^il3)  ^ji3)\x 
■^(^723)  ^j23)|x 
%33)|i 


XNjo  ^Nji 

XNj2  ^Nj3 

XNji  ^  X  ^  Xj<[.^ 


(15) 


5.2.3  Design  Modifications 

With  no  =  3,  it  became  evident  that  if  we  set  la  =  8  and  Id  =  12  as  above,  so  that  each 
string  had  length  L  =  60,  the  size  of  the  population  matrix  and  the  number  of  iterations 
required  for  convergence  would  make  our  approach  computationally  expensive.  To 
avoid  this  problem,  the  genetic  algorithm  was  divided  into  hierarchical  loops.  The 
modified  algorithm  can  be  described  as  follows: 


•  Set  the  global  parameters  M  =  40,  p  =  0.8,  and  Nit  =  number  of  iterations  per 
loop  =  3000. 

•  Loop  1 

1.  Choose  la  =  2  and  Id  =  5  (L  =  21)  so  that  the  4  angles  Oi, . . . ,  04  and  10  k 
values  ki,. . .  ,kio  that  can  be  represented  are  evenly  spaced  over  the  ranges 
[7r/4,7r]  and  [0,2®  —  1],  respectively. 
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2.  Generate  the  initial  population  Q  so  that  all  strings  represent  functions 
which  meet  the  above  specifications  for  and  Z(j2). 

3.  Execute  a  genetic  algorithm  beginning  with  Q  to  find  the  optimal  string 

•S*!  (^loi  I  >  •  ■  •  )  ®l03) 

4.  Create  matrices  SopA  and  SgpD  with  three  rows,  where  row  i  represents 
the  optimal  angle  or  optimal  distance  of  piece  i,  z  =  1, . . . ,  3.  Place  the 
appropriate  sections  of  into  SgpA  and  Sopo- 

•  Loop  2 

5.  Generate  a  new  matrix  Q*,  also  with  L  =  f2  and  Id  =  5,  so  given  that  Oj 
and  ki  are  the  optimal  angle  and  distance  most  recently  selected  for  piece  i, 
the  4  possible  angles  and  10  possible  k  values  for  piece  i  represented  in  Q* 
are  evenly  spaced  over  the  ranges  [uj  —  7r/(4^),ai  +  27r/(4^)]  and  [(2(A:i)  — 
l)/2,ki  +  l]. 

6.  Repeat  step  3  to  find  S2  =  {s2ai ,  S2di ,  ■  •  • ,  5203 ,  sidj). 

7.  Place  the  appropriate  sections  of  S2  into  SopA  and  Sopu  (now  SopAi  = 

(^lai)'S2ai)  and  SopDi  —  (Sldj,  S2dj)). 

•  For  loop  j,  j  >  3  repeat  steps  5-7  where  angle  and  distance  values  are  now  evenly 
spaced  over  [oj  —  'K/{4^),ai  +  27r/(4-’)]  and  [(2(A:i)  —  l)/2,  ki  -I- 1]. 

•  When  the  desired  degree  of  precision  has  been  reached,  the  algorithm  is  stopped 
and  the  matrices  SopA  and  Sopo  contain  the  optimal  piecewise  function. 


Note  that  the  size  of  the  population  matrix  remains  constant  regardless  of  the  number 
of  loops  being  performed.  Hence  the  use  of  this  modified  version  of  a  GA  avoids  the 
manipulation  of  large  matrices,  reducing  the  required  computational  resources,  without 
adversely  affecting  the  precision  of  the  resulting  solution. 


5.2.4  Experimental  Results 

Table  2  shows  the  performance  of  the  proposed  GA  based  algorithm  on  the  artificial 
data  described  in  Section  5.2.1.  The  fitness  value  for  the  generating  lines  is  stated  for 
purpose  of  comparison. 


Nloops 

function 

maxy/lyj) 

Approx 

2 

II 

X  +  1.996  — 1  <  a:  <  0 
1.991  —  X  0  <  a:  <  1 

X- 0.019  l<x<2 

0.266836 

Generator 

9{x)  = 

(  X  +  2  — l<x<0 
<  2  —  X  0<x<l 
lx  1  <  X  <  2 

0.2633 

Table  2:  Results  of  Experiment  2 


After  2  loops  and  only  6000  iterations,  the  GA  converged  to  a  3-piecewise  function 
with  a  greater  fitness  value  than  the  original  generating  lines. 


Figure  5:  Results  of  Experiment  2 


6  Conclusions  and  Future  Research 


We  have  conducted  2  experiments  employing  GA’s  for  the  fitting  of  piecewise  linear 
functions  to  datasets  in  'R?.  Our  results  demonstrate  that  GA’s  can  yield  near  optimal 
results  at  limited  computational  expense. 

These  encouraging  results  have  suggested  several  directions  for  future  research.  Our. 
experiments  involve  cases  where  the  number  of  lines  is  known.  We  would  like  to 
design  an  algorithm  which  determines  the  optimal  number  of  lines  as  well  as  their 
placement.  Many  interesting  problems  involve  data  sets  of  more  than  2  dimensions; 
hence  we  would  like  to  explore  the  use  of  GA’s  for  fitting  hyperplanes  and  other 
multidimensional  surfaces.  The  comparison  of  a  multivariate  GA  for  surface  fitting 
with  existing  methods,  such  as  and  projection  pursuit  regression^^^’^^)  is 

certainly  worth  investigating. 
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